High WSD Accuracy Using Naive Bayesian Classifier with Rich Features
نویسندگان
چکیده
Word Sense Disambiguation (WSD) is the task of choosing the right sense of an ambiguous word given a context. Using Naive Bayesian (NB) classifiers is known as one of the best methods for supervised approaches for WSD (Mooney, 1996; Pedersen, 2000), and this model usually uses only a topic context represented by unordered words in a large context. In this paper, we show that by adding more rich knowledge, represented by ordered words in a local context and collocations, the NB classifier can achieve higher accuracy in comparison with the best previously published results. The features were chosen using a forward sequential selection algorithm. Our experiments obtained 92.3% accuracy for four common test words (interest, line, hard, serve). We also tested on a large dataset, the DSO corpus, and obtained accuracies of 66.4% for verbs and 72.7% for nouns.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملPKU_HIT: An Event Detection System Based on Instances Expansion and Rich Syntactic Features
This paper describes the PKU_HIT system on event detection in the SemEval-2010 Task. We construct three modules for the three sub-tasks of this evaluation. For target verb WSD, we build a Naïve Bayesian classifier which uses additional training instances expanded from an untagged Chinese corpus automatically. For sentence SRL and event detection, we use a feature-based machine learning method w...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملImproving Word Sense Disambiguation Using Topic Features
This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified naı̈ve Bayes network alongside other features such as part-of-speech of neighboring words, single words in the...
متن کاملThe WSD Development Environment
In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning ...
متن کامل